Deep Learning the Indus Script
نویسندگان
چکیده
Standardized corpora of undeciphered scripts, a necessary starting point for computational epigraphy, requires laborious human effort for their preparation from raw archaeological records. Automating this process through machine learning algorithms can be of significant aid to epigraphical research. Here, we take the first steps in this direction and present a deep learning pipeline that takes as input images of the undeciphered Indus script, as found in archaeological artifacts, and returns as output a string of graphemes, suitable for inclusion in a standard corpus. The image is first decomposed into regions using Selective Search and these regions are classified as containing textual and/or graphical information using a convolutional neural network. Regions classified as potentially containing text are hierarchically merged and trimmed to remove non-textual information. The remaining textual part of the image is segmented using standard image processing techniques to isolate individual graphemes. This set is finally passed to a second convolutional neural network to classify the graphemes, based on a standard corpus. The classifier can identify the presence or absence of the most frequent Indus grapheme, the “jar” sign, with an accuracy of 92%. Our results demonstrate the great potential of deep learning approaches in computational epigraphy and, more generally, in the digital humanities.
منابع مشابه
Clustering Indus Texts using K-means
One of the most important undeciphered scripts of the ancient world is the Indus script. Earlier studies had focused on the correlations between signs in the Indus texts using various statistical and computational techniques such as N-grams or Markov chains. In the present study, K-means clustering, an unsupervised machine learning technique is used to identify clusters of similar texts without...
متن کاملIndus Script: A Study of its Sign Design
The Indus script is an undeciphered script of the ancient world. In spite of numerous attempts over several decades, the script has defied universally acceptable decipherment. In a recent series of papers (Yadav et al. 2010; Rao et al. 2009a, b; Yadav et al. 2008a, b) we have analysed the sequences of Indus signs which demonstrate presence of a rich syntax and logic in its structure. Here we fo...
متن کاملA Markov model of the Indus script.
Although no historical information exists about the Indus civilization (flourished ca. 2600-1900 B.C.), archaeologists have uncovered about 3,800 short samples of a script that was used throughout the civilization. The script remains undeciphered, despite a large number of attempts and claimed decipherments over the past 80 years. Here, we propose the use of probabilistic models to analyze the ...
متن کاملA Markov Model of the 4500-year-old Indus Script
Although no historical information exists about the Indus civilization (fl. c. 2600-1900 BC), archaeologists have uncovered about 3800 short samples of a script that was used throughout the civilization. The script remains undeciphered, despite a large number of attempts and claimed decipherments over the past 80 years. Here, we propose the use of probabilistic models to analyze the structure o...
متن کاملIndus script corpora, archaeo-metallurgy and Meluhha (Mleccha)
has to be expanded further to provide for a study of evolution and formation of Indian languages in the Indian language union (sprachbund). The paper analyses the stages in the evolution of early writing systems which began with the evolution of counting in the ancient Near East. Providing an example from the Indian Hieroglyphs used in Indus Script as a writing system, a stage anterior to the s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1702.00523 شماره
صفحات -
تاریخ انتشار 2017